Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
نویسندگان
چکیده
OBJECTIVE (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. MATERIAL AND METHODS A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated 'gold standard'. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. RESULTS The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. DISCUSSION AND CONCLUSION NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.
منابع مشابه
Combining knowledge- and data-driven methods for de-identification of clinical narratives
A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity type...
متن کاملBoB, a best-of-breed automated text de-identification system for VHA clinical documents
OBJECTIVE De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical ...
متن کاملDeveloping a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature
According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...
متن کاملSecondary Use of Laboratory data: Potentialities and Limitations
Clinical databases have been developed in recent years especially during the course of all medical concerns including laboratory results. The information produced by the diagnostic laboratories have great impact on health care system with various secondary uses. These uses are sometimes as publishing new extracted information of laboratory reports which have been widely applied in the scientifi...
متن کاملIdentification of Pattern used in Determination of Critical Success Factors in ITS Projects, Case Study: Road Maintenance and Transportation Organization
One of the risks recognized by relevant authorities is the risk of outsourcing ITS projects. The purpose of this study was to design and explain the pattern of determining the critical success factors in outsourcing large-scale ITS projects in the Ministry of Roads and Urban Development (Road Maintenance and Transportation Organization). This study was performed using qualitative method. The pa...
متن کامل